## Warning: NAs introduced by coercion

Intro

For this analysis on HappyDB, I wanted to focus on a personal curiosity - for peers within my age group (26-30), what brings them happiness? There have been myths that claim that girls mature or “reach adulthood” a few years earlier than boys.

Can we make some inferences from happy moments of millenials? How do males and females differ in this regard?

Overall, it appears that females place more happiness value in bonding than achievement, while men place achievement first.

What words occur most frequently?

Friends, day, and time all feature heavily for both sexes, but females have “husband” as the #4 word, while for males it’s “played”; Wife does not appear until #10

Selecting words by relative importance

## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector

## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf

What if we look at multiple words or words that occur together? (bigrams)

The #1 bi-gram for men aged 26-30 is… video games! Additionally, “played video” and “played games” appears as well.

## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector

## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf

But if we look at the bigrams from their relative importance to the document, the results are somewhat different. Promotion, living life, and dating girlfriend takes the lead for men, while husband surprise is #1 for women.

Topic Modelling

Words with greatest differences between 2 groups:

beta_spread <- hm_topics %>%
  mutate(topic = paste0("topic", topic)) %>%
  spread(topic, beta) %>%
  filter(topic1 > .001 | topic2 > .001) %>%
  mutate(log_ratio = log2(topic2 / topic1))

beta_spread <- beta_spread[order(beta_spread$log_ratio),]
beta_spread_a <- beta_spread[1:10,]
beta_spread_b <- beta_spread[258:267,]
beta_spread_fin <- rbind(beta_spread_a, beta_spread_b)

ggplot(data = beta_spread_fin, aes(y=beta_spread_fin$log_ratio, x=reorder(beta_spread_fin$term,-beta_spread_fin$log_ratio))) + geom_bar(stat='identity', position='dodge') +coord_flip()

Summary:

  1. While playing video games appears prominently among the happy moments of men aged 26-30, they draw similar happiness from promotions/achievements and moments of affection (dating their girlfriends or their wives giving birth)

  2. For women, family influences happy moments heavily. Husbands, sons, and daughters come up more than I initially expected, whereas the word “boyfriend” occurs far less frequently